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FACULTY SALARY: ISSUES IN MULTIPLE REGRESSIONS 



Equity in faculty salaries has always been a controversial issue facing institutions of higher 
education. As a bastion of academic freedom where faculty and students purse knowledge, it is 
almost anti-intellectual to find a gender gap in salaries. More importantly, there are laws, which 
ensure gender equity in salaries. Thus, salary equity has evolved from a moral issue to a legal 
issue. A judicial salary decision has become all the more important when faculty discrimination 
can be brought to court. 

In order to fend off any legal challenges in court, an institution of higher education has to 
first decide whether or not there is a salary disparity among faculty on campus. There are various 
ways of pursuing this inquiry. First, descriptive statistics are informative, yet they lack 
inferential power. Then, the use of inferential statistics such as regression analysis provides more 
probative values yet poses some difficult statistical problems. The nature of these statistical 
problems is rather different if one views it statistically or substantially. The purpose of this paper 
is to discuss some major statistical issues involved in the court system, and to explore solutions 
to such problems. Actual data was used in presenting the problems and solutions. 

Law with Regard to Gender Discrimination 

There are two major statutes available for seeking remedy in employment sex 
discrimination. The Equal Pay Act and Title VII of the 1964 Civil Rights Act. The essence of 
The Equal Pay Act refers to the condition that employers have to pay equal wages to employees 
when the job requires equal skill and responsibility. Merit increase, seniority, and production are 
the only exceptions where pay differentials are allowed. The second law is much broader in 
scope. Title VII prohibits discrimination in either private or public institutions of higher 
education, and is the most widely used statute in litigation. Title VII stipulates that employment 
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practices should not discriminate against employees on the grounds of religion, race, color or 
sex. In Griggs v. Duke Power Co. (1971), the Supreme Court decided that the discriminatory 
impact of an employment decision is a major concern - even if the policy is neutral in nature. 
This is important because a gender disparity in faculty rank could become an issue of contention 
even when the average salaries of both sexes are the same. 

The decision to allow the use of statistics in court could trace back to International 
Brotherhood of Teamsters v. U.S. (1977) when the U.S. Supreme decided that statistics were 
probative of discrimination. Since then, most legal cases involve statistics, particularly the use of 
multiple regressions. 

Procedure 

A sample of 357 faculty members was selected from a Catholic institution of higher 
education in the Midwest. Variables included salaries, rank, tenured status, performance 
measures, gender, race, colleges and departments. The actual salary study was performed in 
2004. Illustration of the statistical issues in the paper used the actual data from the 2004 faculty 
salary study. 

Issues of Standardized and Un-standardized Regression Coefficients 

Related to the impact of variable groupings, an issue emerges as the individuals in a 
group may change the relative variances among variables (Langbein & Lichtman, 1978). 
Aggregation bias will be included in the standardized measures, such as correlation and 
standardized regression coefficients, but not in un-standardized regression coefficients. 

Issues in usage of standardized versus un-standardized regression coefficients received 
great attention when path analysis was introduced in the social sciences. When path models were 
presented, immediate questions were raised as to whether standardized regression coefficients or 
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un- standardized regression coefficients should be used in a comparison. Aggregation bias and 
appropriate comparison are two facets of one issue, both relating to the variance ratio of s(x) and 

s(y). 

The symbol for a standardized regression coefficient, usually called a beta coefficient B , 
which is computed by converting both x and y to Z-scores and then estimating the regression 
equation. Thus, an un- standardized regression could change into B by multiplying the un- 
standardized regression coefficients by the standard deviation of the independent variable, then 
dividing it by the standard deviation of the dependent variable. 

B yx=(byx)(sx/sy). 

Much discussion on the usage of un-standardized regression coefficients and B were published in 
the path analysis literature. Blalock (1971) argued that B was appropriate for describing the 
relationship in a single sample; un-standardized regressions should be used for comparing 
samples or stating general laws. Most sociologists tend to agree with this statement. (Duncan, 
1966; Boudon, 1965; Turkey, 1954) i?yx is a standardized regression coefficient indicating the 
amount of change the dependent variable for an independent variable change of one standard 
deviation (in standard deviation units). 

On the other hand, byx is an un-standardized regression coefficient indicating the amount 
of change in any unit of measurements of the dependent variable for an independent variable 
change (in any unit of measurement). Thus, byx has an invariant property from sample to 
sample. For example, if the unit measurement is in dollars or meters, it will remain the same. The 
magnitudes of sx/sy are the real variations in a particular sample, and are therefore subject to 
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change across samples. It then becomes misleading to compare the value of B yx in two different 
samples when the values of byx or sx/sy are uncertain. Schoenberg (1972) analyzed Bayer’s data 
and presented the following table. 

Bayer’s study (1969) was one of several major studies focusing on educational aspiration, 
social and economic status, peer influence upon aspiration and their correlates. In the sociology 
of education, the hypotheses that intelligence and family SES are appreciable influences on 
aspiration, and that the influence of a friend’s aspiration on a respondent’s aspiration are also 
appreciable, have generated much debate on the meaningful comparison of sociological data. 

The un- standardized regression coefficients indicated that SES affected a significant 
others’ influence in large cities three times more than on farms, yet standardized regression 
coefficients indicated that SES had the same influence on significant others’ influence in large 
cities and on farms. The reason for this discrepancy is due to the standard deviation of SES 
varying greatly among the different regions (see column 4, table 1). A faulty conclusion could 
have been avoided if one used un- standardized regression coefficients in the comparison, as 
Schoenberg suggested (1972). 

Table 1 



Comparison of the Behavior of Standardized Regression Coefficients 


and Un-Standardized Regression Coei 


f'ficients across Residence Categories 


















1 


2 


3 


4 


5 


6 


7 


8 


Farm 




3 


6.048 


1.677 


0.174 


0.048 


5.854 


Village 


Under 2,500 


3.2 


9.851 


1.666 


0.247 


0.042 


6.159 


Small city 


2500-25,000 


3.9 


11.267 


1.752 


0.236 


0.036 


6.336 


Medium city 


25,000-100,000 


4.7 


11.505 


1,665 


0.232 


0.034 


6.371 


Large city 


over 100,000 


5.5 


12.226 


1.612 


0.145 


0.019 


6.89 



1 - Residence 

2- Population 
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3- Mean log of population 

4- Standard deviations of SES 

5- Standard deviations of significant others' influence 

6- Standardized regression coefficients of SES on significant others influence 

7- Unstandardized Regression coefficients of SES on significant others' influence 

8- Intercept 

Correct use of standardized and un-standardized regression coefficients has a tremendous 
influence on the conclusion of salary studies. Unless a meaningful comparison is being made, 
one can never be sure if a gender-bias exists — affecting rank, tenure and salaries. Since both rank 
and tenure are measured in same units, un-standardized regression coefficients should be used in 
the comparison. 

Table 2 was computed on a female sample of 130 faculty members and a male sample of 
227. Tenure status is a dummy variable coded either as 1 — tenured, or 0 — not tenured. The 
standardized regression coefficients B and un-standardized regression coefficients, along with 
other pertinent information were presented in Table 2. Examining the standardized regression 
coefficients, one would conclude that the effect of tenure status was more important upon salary 
levels for the female faculty (.216) than for the male faculty (.191). However, when one looked 
at the un-standardized coefficients, the effect of tenure status was far more important upon salary 
for male faculty (9497) than female faculty (6693), reversing the conclusion found from 
standardized regression coefficients. If one applied the numbers into the equation 
77yx=(byx)(sx/sy), one would have immediately realized that the standard deviation of the male 
faculty salary varied greatly (22,591), while comparatively speaking, the standard deviation of 
female faculty varied far less (15,199) Thus, when both standardized deviations of tenured status 
remained the same (.491 versus .455), the effect of sx/sy could neutralize the effects of byx, and 
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reverse the conclusion of un- standardized regression coefficients. The computations were 
presented as followed: 



.216=6693*. 49/15, 199 
.19 1=9497*. 45/22, 591 



Table 2 



Comparison of Tenure Status upon Salary across Gender 



VI 3 


Model 




Un-standardized 

Coefficients 


Standardized 

Coefficients 


t 


Sig. 








B 


Std. Error 


Beta 






Female 


1 


(Constant) 


55450.784 


2086.253 




26.579 


.000 






V5A 


6693.076 


2676.240 


.216 


2.501 


.014 


Male 


1 


(Constant) 


61555.152 


2735.550 




22.502 


.000 






V5A 


9497.519 


3248.214 


.191 


2.924 


.004 



a Dependent Variable: V6, Independent variable: Tenure Status 0 = Not Tenured, 1 =Tenured. 



Descriptive statistics for Salary & Tenure Status 



VI 3 




N 


Minimum 


Maximum 


Mean 


Std. Deviation 


Female 


V5A 


130 


.00 


1.00 


.6077 


.49015 




V6 


130 


$39,390 


$128,500 


$59,518.12 


$15,199,239 


Male 


Valid N 

(listwise) 

V5A 


130 

227 


.00 


1.00 


.7093 


.4551 1 




V6 


227 


$36,260 


$185,810 


$68,291.28 


$22,591,845 




Valid N 
(listwise) 


227 











A similar approach was adopted by Duncan (1969), who adamantly advocated the use of 
un-standardized regression coefficients in his interpretation of the stratification process in White 
and Non- White populations. Personal income measured in dollars was the variable used in the 
comparison. The invariable nature of the measurement of dollars avoids unnecessary problems 
accompanying the use of standardized regression coefficients. 
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Multi- Collinearity 

Interaction effect and multi-collinearity present special problems in analyzing salary data. 
Tenure, rank and merit increase all tend to go together. In Bakewell v. Stephen F. Austin State 
University (1996) case, the court found that the plaintiffs models suffer from multi-collinearity 
because experience, degree, and gender are all highly related to each other. Even so, the court 
stated that the problems were not significant enough to negate the results of probative value. The 
lesson from this decision is to decide the nature of multi-collinearity. If it is a problem of 
computation then one has to drop a variable in order to estimate the other one. When two 
variables are perfectly correlated or nearly perfectly correlated so that the inversion of matrix 
could not be found, the regression coefficients of the variables cannot be computed. A perfect 
collinearity is extremely difficult to find empirically. 

Less than a perfect multi-collinearity among the variables poses estimation problems. 
Ordinary Least Square parameters remain reliable and un-biased, but the standard errors for the 
regression coefficients become large and imprecise. Lurthermore, simple correlation among the 
variables is not a necessary condition for multi-collinearity. Multi-collinearity arises when the 
individual coefficients indicate that the null hypothesis (where the coefficients were zero) cannot 
be rejected, yet the combined set of regression coefficients were significantly jointed in the L- 
test. Lurthermore, the values of regression coefficients changed dramatically from sample to 
sample. 

Multi-collinearity referred to a problem in the regression equation when two independent 
variables were highly correlated. In social sciences and economics, variables were generally 
correlated. Income, education, ACT scores, grades, and social status were highly correlated. If 
one treated a variable such as wealth as a dependent variable, one would expect a high 
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correlation with an independent variable like education. The question emerged when both wealth 
and education were included in the equation as independent variables, of which their high 
correlation obfuscated the individual effect of each of the variables upon the dependent variable 
of prestige. In a severe case, wealth was found significant while education became insignificant 
or vice versa solely depending upon the order of variables entered in the equation. A more subtle 
problem emerged when a different sample was chosen. Although the regression coefficients of 
the correlated variable remained unbiased, its standard errors were far larger than they would 
have been in absence of multi-collinearity. As the standard errors of regression coefficients were 
used to construct the confidence level around the sample point estimates of regression 
parameters, the larger the standard errors, the wider the confidence intervals and the less precise 
the regression coefficients (Knoke, Bohrnstedt and Mee, 2002). 

There are two ways of solving the problem of which either is acceptable in an appropriate 
setting to the social scientists. One strategy is to drop a highly correlated variable in the sample 
and the other is to combine the variables into an index. The Duncan Socioeconomic Index (SEI) 
is a combination of income, education, and occupational prestige (Reiss, Duncan, Hatt and North 
1961). 



Lewis and Beck (1980) cited a study to illustrate the issue of multi-collinearity by 
dropping a variable from the model. In order to assess the support, Peron garnered from workers 
and internal migrants in the 1946 Argentina Presidential election, sociologist Gino Germain 
included the following variables in the equation. “Y” represented the percentage of the country’s 
vote for Peron, xi represented urban blue-collar workers (as a percentage of the economically 
active population in the county), x 2 , rural blue-collar workers (as a percentage of the 
economically active population in the county), x 3 , urban white-collar workers (as percentage of 
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the economically active population in the county), X 4 , rural white-collar workers, and x 3- internal 



migrants (as a percentage of Argentinean-bom males). The result of the analysis with five 
independent variables was as follows: 

Y=.52 +.18xi -.10x 2 -. 57 x 3 - 3.5 X 4 +. 29 X 5 



The only significant regression coefficient was .29 and none of the workers were 
influential in the election of Peron support. However, when each independent variable regressed 
with the remaining variables, the astounding results were revealed: R 3 X2=.99, R 3 x 3 =.98, 
R 2 xi=.98, R2X 4 =.75 and R 2 x 5 =.32. Such high correlation (.99) indicated multi-collinearity 
existed. Removal of the X 2 from the equation brought about a new analysis and thus an entirely 
new perspective to the data. 

Y=.42+.28 xi -.47 x 3 - 3.07 x 4 +.30x 5 



With the exception of the intercept, all the variables in the equation were significant at .05. The 
conclusion that workers have played a major role in Peron’s election was valid. This is an 
example where dropping a variable to solve the problem of multi-collinearity was successful. 

In salary studies, the problem of multi-collinearity arises when one finds that tenured 
status and rank are tangled. It is customary in an institution of higher education to grant tenure to 
an assistant professor along with a promotion to rank of associate professor. Therefore, rank and 
tenure status almost goes hand in hand. However, dropping either rank or tenure status would 
present tremendous difficulties in presenting the data to the jury in court. In Sobel v. Yeshivam 
case, (1988). The court found that the regression model should include rank as a variable, and 
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any other variable included in the model has to show relevance to the variable of rank. In 
Presseisen v. Swartlvnore College (1977), the court rejected the regression model presented by 
the plaintiff because the model did not include the variable of rank. 

If dropping the variable of rank is not advisable, then some other alternatives have to be 
sought. As one would expect, the prior knowledge between tenure status and rank is readily 
available. Previous salary data were also available to assess the quantitative ratio between tenure 
status and rank. The regression coefficient of tenure status on rank in the sample was 1.12 in year 
2002, 1.13 in year 2001, and 1.11 in year 2000. It is entirely reasonable to assume that the ratio 
between tenure status and rank is somewhere between .36 and .35 within the population, which 
can be utilized to alleviate the problem of multi-collinearity. 

Y= a + biXj+ b 2 x 2 + e 
b 2 =1.13bi (b) 

let x 2 = rank; and xi= tenure status 
rankten=1.13*tenure + rank 
salary = a + rankten + e (a) 

Table 3 Coefficients(a) 



Model 




Un-standarc 


ized Coefficients 


Standardized 

Coefficients 


T 


Sig. 






B 


Std. Error 


Beta 






1 


(Constant) 


39,005.794 


3,145.477 




12.401 


.000 




RANKTEN 


7,092.908 


811.488 


.421 


8.741 


.000 



a Dependent Variable: v 6 Salary 

The variable of rankten is a composite variable of tenured status and rank. One can now use 
ordinary least regressions to compute the regression coefficients in the equation (a) where the 
calculation of rankten yielded the un- standardized regression of 7092. This was also the estimate 
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of effect of tenured status. The estimate of rank was derived from the equation (b), 
7092*1.13=8013. 

Issues of Interaction 

One of the most serious problems with multi-collinearity is the use of multiplicative 
terms in the regression analysis. Blalock (1972) indicated that if one defines X 3 =xiX 2 , then X 3 is 
an exact nonlinear function of xi and X 2 . Thus, his findings concluded that the correlations of X 3 
with both xi and X 2 were typically very high. Gordon (1968) identified that both the number of 
variables and the correlations among the variables have substantial impact upon the 
interpretation of the data. The redundancy of the variables referred to the high correlation among 
the variables and repetitiveness referred to the number of variables. The sizes of the regression 
coefficient were depressed for a larger set of variables and enlarged for a smaller set of variables. 

With equal repetitiveness, variables with less redundancy had larger regression 
coefficients and smaller standard errors than variable with more redundancy. Further analysis of 
Gordon’s work indicated that the more interaction allowed in the equation, the more serious the 
problem of multi-collinearity became (Althauser, 1971). The size of interaction related to the 
regression coefficient became smaller relative to the coefficient of the individual terms. 

Cronbach (1987) suggested centering the variables by subtracting the mean score from each 
individual score. The slope remained the same while intercept became the mean of the 
independent variable. This approach tended to produce smaller correlations between the 
individual variables and their interaction variables, yet a complete treatment of the complex 
issues raised by Gordon remained unanswered. 
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The problem of multi-collinearity should be placed in perspective. The seriousness of the 



multi-collinearity issue can be reflected by the Coefficient Variance Inflation, a measure of 
multiple correlations of independent variables. V I F= 1 /( 1 - Rij ) where R“ij stands for the multiple 
correlation of the J variable with other independent variables (i) in the equation (Johnston, 1972). 
Moderate correlation (less than .50) among the independent variable “S” was of no great 
theoretical significance in the study, especially when one is careful with the interpretation of 
inflated standard errors. 

The sample data was used to compute the interaction effect of rank and tenured status on 
salary. The un-standardized regression coefficients (-1818.33) of the interaction term was not 
significant (.595). Thus, the issue of interaction of tenured status and rank is not an issue of 
concern in this sample. 



Table 4 Coefficients(a) 



Model 




Un-standardized 

Coefficients 


Standardized 

Coefficients 


T 


Sig. 






B 


Std. Error 


Beta 






1 


(Constant) 


20373.848 


6419.941 




3.174 


.002 




V5A 


-5138.081 


8895.336 


-.117 


-.578 


.564 




V4A 


17884.483 


2883.434 


.698 


6.202 


.000 




V45A 


-1818.330 


3419.664 


-.144 


-.532 


.595 



a Dependent Variable: V6, v45a=v4a*v5a, interaction effect. 



Variables Entering the Regression Model 

Variables entering into the regression model were generally determined by theory in 
economics. This was especially true in econometric literatures. For example he Cobb-Douglas 
function, Q=b 0 L bl k b2 e u , where Q represented output, L the labor input in work hours, C, capital 
input in machine-hours, and In L and In K were highly correlated yet, states that capital and labor 
should be included in the equation (Johnston, 1972). In faculty salary studies, there was no such 
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well-ground theory to specify which variable should be included in the equation. Institutions 
have specific personnel policies to accommodate their needs. Doctoral research institutions may 
have different promotion policies from community colleges. A survey of literatures however, 
would provide some common background for the variable selected in the regression models. 
Rank, discipline, merit, tenured status, gender, and minority status were all variables recorded in 
the faculty personnel system and were widely used in the salary studies on campus. Discussion 
on the appropriate use of these variables has consequently entered the debate in court. 

Rank. One of the most polemic discussions on the variable selection in the regression 
analysis is rank. In Mecklenburg v. Montana Board of Regents of Higher Education (1976), the 
court rejected the use of rank because the court believed that rank was related to promotion, 
which was discriminatory in nature. This argument has been widely shared by various 
researchers (Ferber and Green, 1982, Scott 1977, Gray, 1990). Confusion arose here because the 
discriminatory meaning of rank had two interpretations. Rank was discriminatory because it was 
awarded in a fashion that the individuals or committees on campus discriminated against female 
faculty either in the promotional process or in the outcome. In this case, this is morally wrong 
and should be ameliorated. The second meaning of discriminatory in rank was morally sound 
because rank was implied to refer to promotion, which was objectively part of an evaluation. In 
light of this debate, objections to the use of rank in the regression analysis centered the argument 
that the discriminatory nature of rank will obscure the result of the salary disparity of gender. 

Recent court cases however, asked that rank be included in the salary study. (Sansonetti, 
1988; Fogel, 1986). Fogel (1986) believed that rank should be included because gender 
discrimination in pay could be appealed only within the same level of job category. Thus, 
salaries for full professors should be higher than for associate professors. If there are 
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disproportionate male faculty members within a higher rank category, a separate study on the 
rank appointment should be addressed, as Title VI indicates that discrimination in rank is 
prohibited. In Sobel v. Yeshiva University (1988), the court stipulated that rank should be 
included in any regression analysis. 

Whether or not the variable of rank should be included in a regression analysis is an 
empirical query. If rank is related to salary, then it should be included in the model. Omission of 
the variable of rank from the regression model does not remove the influence of rank upon salary 
from the model, but simply leaves its influence in the residual variable or other correlated 
variables, such as tenured status in the model. When one uses an un-standardized regression 
analysis, the constant term connotes all the unexplained variances. The influence of rank would 
be left in the constant term if rank is left out of the model. A separate regression analysis would 
detect any tremendous differences in the constant terms between the equation with the variable 
of rank and the equation without said variable. The estimates of the regression coefficients for 
the variables in model become either underestimated or overestimated depending on either the 
positive or the negative relationship among rank and the variables in the model. 

Race. Race, like gender, is also a variable of contention. The question raised in salary 
studies is whether or not to combine different minority races into one. The advantage of 
combining different races is that it avoids the danger of leaving small cases into racial categories. 
Depending on the size of the institution, one may end up with fewer than five cases in a 
particular category such as Native American. Prior examination of the racial data before 
combining them is an idea to be worth studying. Asian faculty in the engineering department 
may not be at a disadvantageous position relative the European faculty, and combining them 
with other minorities may bias the results. The general consensus seems to be that if there are 
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enough cases, breaking each category out would be desirable. Caution comes when there are not 
enough cases for a particular racial category. The racial categories in the 1993 salary study at the 
State University of New York (SUNY) were African-American, Latino and Asian (Haignere, Lin 
and Eusenber 1993). The separation of Asian from other minority groups was deemed necessary 
because of the existing salary differentials among the different minority groups. 

The difference among college, discipline, and department. Less controversial is the 
issue of including discipline or departmental differences in faculty salary studies. In Coser v. 
CoIIvier (1984), the court agreed to the institution’s analysis that it included departments in the 
analysis. Along the same line, the court ruled against the plaintiffs’ analysis because it did not 
include the variable of department or discipline in the regression model in Wilkins v. University 
of Houston (1981). In Soble v. Yeshiva (1988), the court asked the plaintiff to account for the 
differences in salaries between departments. 

If the past court decisions provide any clue to the future trend, the variable of discipline 
or department will be included in the study. This approach tacitly agreed with the basic 
assumption that disciplinary differences in salaries were legitimate since they reflected the 
market forces, specifically the law of supply and demand. However, some researchers were 
dubious as to whether or not the negative association between salary levels and the percentage of 
women in academic discipline could be attributed to market forces. Academic disciplines with 
higher proportions of women had the lowest growth in salary over time (Bellas, 1997, Semelroth, 
1987). The same negative relationship was also found between the proportions of women in 
academic disciplines and average entry-level salaries (Staub, 1987). These studies suggested that 
salary differentials between genders were at least partially due to sex discrimination and not 
entirely dependent on market forces. The issue of comparing the value of different disciplines on 
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campus seems to be embroiled into the larger issue of comparing the value of or worth of 
dissimilar work whereby a practice of a comparable worth salary remains elusive. 

The pertinent question is: will entering the “department” impair the quality of data 
analysis? Using either the hierarchical models or dummy variables in an ordinary regression will 
significantly decrease the degrees of freedom because of the number of departments. In a 
department with fewer than five faculty members, the number of the faculty members would be 
less than the number of the variables in the equation where an inverse matrix in regression 
analysis could not be computed. Less dramatic is the situation where the significance test loses 
its power because of small size of the departments. Comparison by colleges seems to be a 
desirable alternative, especially when there is not much of a salary difference among the 
departments within a college. 

An empirical quandary facing research is the impact departmental separation versus 
discipline separation would make on the sample size of each disciplines or departments so small 
that renders meaningful comparison impossible. 

Hierarchical Linear Models 

Hierarchical models were specifically designed to find meaningful comparisons in terms 
of the different disciplines and colleges. The ecological fallacy has long been noticed by social 
scientists when one inferred individual relationships from aggregate data. As Robinson (1950) 
has long indicated, one would choose the wrong level to analyze the data when it is the only data 
available. Hannan (1970) further stated that a general inability to construct a macro-measure to 
reflect the meaning of the general theory was the real problem. A necessary correspondence 
between the micro-level and macro-level is needed in testing a theory. Robinson (1950) found an 
individual correlation of .22 between race and literacy, a state marginal correlation of .77 and a 
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census district marginal of .94. Correlation increased as the level of aggregation changed. The 
aggregation and dis-aggregation effects detailed in the analysis evidenced the difficulties 
associated with the synthesized results from different level of constructs. The implication of the 
Robinson study was that when areas correlations increase from within, the numbers of areas 
increased. 

The bounds of individual correlations inferred from the grouped data were studied by 
Duncan and Davis (1953). Only one cell needed to be determined in order to estimate the 
correlation of the table cells. As each row and column of the tables was constrained by the 
marginal totals, the maximum and minimum values of a particular cell were determined. The 
values of Phi-coefficients, an indication of correlation were bounded by the marginal of a 
contingency table (Liu, 1980). The correlation dynamics became more evident when Cartwright 
presented the conclusion that small groups having negative group correlations when combined 
into large groups would increase the correlations of the large groups (1969). This analysis 
presented mathematical proof of Robinson’s line of reasoning. 

Although conceptually the inflation problem has long been recognized and widely 
discussed among social scientists, the solutions were regarded technically impossible until in the 
1990’s (Raudenbush and Bryk 2002). Hierarchical linear models, which were an extension of 
general regression analyses, were specifically introduced to address this problem of the 
assumption of independence being violated in regression analysis. 

Institutional researchers are well versed in the concept of the dependent and independent 
variables in the general model of regression analysis. The independent variable of Socio- 
economic status and the dependent variable of College Matriculation are tested to see if there 
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exists any relationship between them. A null hypothesis is formulated and an F-test is used 
against the alternative hypothesis. When Socio-economic status was measured by Duncan’s 
socioeconomic index (Blau & Duncan, 1967) and matriculation rate is measured by an age 
cohort, the reliability and validity of the measurements are impeccable. As most of the Office of 
Institutional Research would be asked to incorporate home addresses into the study for 
recruiting, the assumption of independence in regression analysis is violated. Neighborhood 
information such as zip code is highly correlated with social economic status. When observations 
are not independent, hierarchical model should be used in lieu of ordinary regression. 

When the information of the correlated sample is not used in the study, the principle of 
constant variance in multiple regressions is violated. Hierarchical models were designed to 
alleviate this problem statistically. Furthermore, the use of the model would draw on the 
estimation of variance and covariance components with imbalanced, nested data (Raudenbush 
and Bryk, 2002). This is quite conceivable in faculty salary analyses that different colleges have 
different views about the reward structures. Law school faculty members generally have higher 
salary demands than liberal arts faculty members. The salaries in the same college are more 
likely to be similar than those salaries of faculty in different colleges because of distinct 
academic market demands. In other words, colleges are a random factor, and thus, not a fixed 
factor — as the fixed-effects analysis of variance models assumes. 

The results of an Unconditional Random-Effects Model were presented in Table 5- 
Estimates of Covariance Parameters (a). It indicates variance estimates for the two components: 
the college effect and the residual terms. 



Table 5 - Estimates of Covariance Parameters (a) 



Parameter 


Estimate 


Std. Error 


Weld Z 


Sig. 


95% Confidence Interval 
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Lower Bound 


Upper 

Bound 


Residual 


210,277,883 


15,828,206 


13.285 


.000 


181,435,083 


24,370,583 


college Variance 


669,054,389 


555,234,021 


1.205 


.228 


131,542,881 


34,029,494 



a Dependent Variable: v6. 



The residual variance was 210,277,883 revealing how much salary differences vary within a 
college. The college variance was 669,054,389 — three times as large as the residual variance 
indicated that the extent of faculty salary varied in the population of colleges. The intra-class 
correlation coefficient, 75.8 %, is known the variance could be attributed to differences between 
colleges. The intra-class is close to 1 when the entire salary difference could be attributed to the 
colleges. 



Total variance = residual variance + college variance 
Intra-class coefficient = college variance/total variance 

However, the column entitled “Sig.” indicated that college variance was not significant at .05 
(.228). Hence, the null hypothesis that variance components are 0 could not be rejected. In this 
study, the different colleges have not indicated differential salary structures among male and 
female faculty members. 

Validity of inferences is more superior in hierarchical models than in ordinary regression 
models. When data were completely balanced, a small sample theory for inferences holds 
because the value of standard errors was distributed as a t variety (Raudenbush and Bryk, 2002). 
However, balanced data, unlike in an experimental design, were almost impossible to find in 
survey research. In imbalanced cases, a large sample theory was employed in estimating the 
fixed effects and their stand errors. Variance components generally depend upon the sample sizes 
to justify its large sample normality approximation. Extensions beyond the basic two levels were 
straightforward in logic, but would encounter difficulties in meeting the sample size requirement. 
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Conclusion 



Issues of equity by gender require ongoing reviews, and detailed information is generally 
available on campus. Regression analysis may not be solely appropriate in deciding whether a 
prima facie case of discrimination exists. Sample size is one major factor, which could influence 
the results significantly yet researchers in most cases can only rely on the available cases on 
hand. In some instances, some of the statistical techniques become unavailable because of 
insufficient sample size. Hierarchical Linear models are extremely powerful in data analysis 
when data structures are hierarchical. Modern statistical techniques and computer programs 
make fitting such models relatively easy. Insufficient cases in the sample often render this 
technique useless. This obstacle certainly should not preclude the usage of regression analysis in 
salary equity studies but caution the researcher to be more careful with the analysis of data. 

The inferential power of regression analysis has been proven in court, and one can 
reasonably assume its usage will be more prevalent in the future. Still, researchers should take 
precautions when using statistics to sift through delicate, salient topics such as gender equality in 
the higher education workforce. It is crucial that the investigators take great strides to ensure that 
the method and manner they pursue gender pay equity or any such topic is both politically 
feasible and logically coherent. 
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